Random variable

In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable. (For finite probability spaces, the measurable requirement is superfluous.) Intuitively, a random variable is a numerical description of the outcome of an experiment (e.g., the possible results of rolling two dice: (1, 1), (1, 2), etc.) Random variables can be classified as either discrete (a random variable that may assume either a finite number of values or an infinite sequence of values) or as continuous (a variable that may assume any numerical value in an interval or collection of intervals). A random variable's possible values might represent the possible outcomes of a yet-to-be-performed experiment, or the potential values of a quantity whose already-existing value is uncertain (e.g., as a result of incomplete information or imprecise measurements). Intuitively, a random variable can be thought of as a quantity whose value is not fixed, but which can take on different values; a probability distribution is used to describe the probabilities of different values occurring. Realizations of a random variable are called random variates.

Random variables are usually real-valued, but one can consider arbitrary types such as boolean values, complex numbers, vectors, matrices, sequences, trees, sets, shapes, manifolds, functions, and processes. The term random element is used to encompass all such related concepts. A related concept is the stochastic process, a set of indexed random variables (typically indexed by time or space).

Contents

Introduction

Real-valued random variables (those whose range is the real numbers) are used in the sciences to make predictions based on data obtained from scientific experiments. In addition to scientific applications, random variables were developed for the analysis of games of chance and stochastic events. In such instances, the function that maps the outcome to a real number is often the identity function or similarly trivial function, and not explicitly described. In many cases, however, it is useful to consider random variables that are functions of other random variables, and then the mapping function included in the definition of a random variable becomes important. As an example, the square of a random variable distributed according to a standard normal distribution is itself a random variable, with a chi-squared distribution. One way to think of this is to imagine generating a large number of samples from a standard normal distribution, squaring each one, and plotting a histogram of the values observed. With enough samples, the graph of the histogram will approximate the density function of a chi-squared distribution with one degree of freedom.

Another example is the sample mean, which is the average of a number of samples. When these samples are independent observations of the same random event they can be called independent identically distributed random variables. Since each sample is a random variable, the sample mean is a function of random variables and hence a random variable itself, whose distribution can be computed and properties determined.

One of the reasons that real-valued random variables are so commonly considered is that the expected value (a type of average) and variance (a measure of the "spread", or extent to which the values are dispersed) of the variable can be computed.

There are several types of random variables, the most common two are the discrete and the continuous.[1] A discrete random variable maps outcomes to values of a countable set (e.g., the integers), with each value in the range having probability greater than or equal to zero. A continuous random variable maps outcomes to values of an uncountable set (e.g., the real numbers). For a continuous random variable, the probability of any specific value is zero, whereas the probability of some infinite set of values (such as an interval of non-zero length) may be positive. A random variable can be "mixed", with part of its probability spread out over an interval like a typical continuous variable, and part of it concentrated on particular values like a discrete variable. These classifications are equivalent to the categorization of probability distributions.

The expected value of random vectors, random matrices, and similar aggregates of fixed structure is defined as the aggregation of the expected value computed over each individual element. The concept of "variance of a random vector" is normally expressed through a covariance matrix. No generally-agreed-upon definition of expected value or variance exists for cases other than just discussed.

Examples

The possible outcomes for one coin toss can be described by the state space \Omega = {heads, tails}. We can introduce a real-valued random variable Y as follows:


    Y(\omega) = \begin{cases}
          1, & \text{if} \ \ \omega = \text{heads} ,\\
          0, & \text{if} \ \ \omega = \text{tails} .
        \end{cases}

If the coin is equally likely to land on either side then it has a probability mass function given by:

\rho_Y(y) = \begin{cases}\frac{1}{2},& \text{if }y=1,\\
\frac{1}{2},& \text{if }y=0.\end{cases}

A random variable can also be used to describe the process of rolling a die and the possible outcomes. The most obvious representation is to take the set \Omega = {1, 2, 3, 4, 5, 6} as the state space, defining the random variable X equal to the number rolled. In this case,

X(\omega) = \begin{cases}1,& \text{if a 1 is rolled} ,\\
2,& \text{if a 2 is rolled} ,\\
3,& \text{if a 3 is rolled} ,\\
4,& \text{if a 4 is rolled} ,\\
5,& \text{if a 5 is rolled} ,\\
6,& \text{if a 6 is rolled} .\end{cases}
\rho_X(x) = \begin{cases}\frac{1}{6},& \text{if }x=1,2,3,4,5,6,\\

0,& \text{otherwise} .\end{cases}

An example of a continuous random variable would be one based on a spinner that can choose a horizontal direction. Then the values taken by the random variable are directions. We could represent these directions by North West, East South East, etc. However, it is commonly more convenient to map the sample space to a random variable which takes values which are real numbers. This can be done, for example, by mapping a direction to a bearing in degrees clockwise from North. The random variable then takes values which are real numbers from the interval [0, 360), with all parts of the range being "equally likely". In this case, X = the angle spun. Any real number has probability zero of being selected, but a positive probability can be assigned to any range of values. For example, the probability of choosing a number in [0, 180] is ½. Instead of speaking of a probability mass function, we say that the probability density of X is 1/360. The probability of a subset of [0, 360) can be calculated by multiplying the measure of the set by 1/360. In general, the probability of a set for a given continuous random variable can be calculated by integrating the density over the given set.

An example of a random variable of mixed type would be based on an experiment where a coin is flipped and the spinner is spun only if the result of the coin toss is heads. If the result is tails, X = −1; otherwise X = the value of the spinner as in the preceding example. There is a probability of ½ that this random variable will have the value −1. Other ranges of values would have half the probability of the last example.

Formal definition

Let (Ω, ℱ, P) be a probability space and (E, ℰ) a measurable space. Then an (E, ℰ)-valued random variable is a function X: Ω→E which is (ℱ, ℰ)-measurable. The latter means that, for every subset B ∈ ℰ, its preimage X −1(B) ∈ ℱ where X −1(B) = {ω: X(ω) ∈ B}.[2] This definition enables us to measure any element B in the target space by looking at its preimage, which by assumption is measurable.

When E is a topological space, then the most common choice for the σ-algebra ℰ is to take it equal to the Borel σ-algebra ℬ(E), which is the σ-algebra generated by the collection of all open sets in E. In such case the (E, ℰ)-valued random variable is called the E-valued random variable. Moreover, when space E is the real line ℝ, then such real-valued random variable is called simply the random variable.

Real-valued random variables

In this case the observation space is the real numbers. Recall, (\Omega, \mathcal{F}, P) is the probability space. For real observation space, the function X: \Omega \rightarrow \mathbb{R} is a real-valued random variable if

\{ \omega�: X(\omega) \le r \} \in \mathcal{F} \qquad \forall r \in \mathbb{R}

This definition is a special case of the above because \{(-\infty, r]: r \in \R\} generates the Borel sigma-algebra on the real numbers, and it is enough to check measurability on a generating set. (Here we are using the fact that \{ \omega�: X(\omega) \le r \} = X^{-1}((-\infty, r]).)

Distribution functions of random variables

Associating a cumulative distribution function (CDF) with a random variable is a generalization of assigning a value to a variable. If the CDF is a (right continuous) Heaviside step function then the variable takes on the value at the jump with probability 1. In general, the CDF specifies the probability that the variable takes on particular values.

If a random variable X: \Omega \to \mathbb{R} defined on the probability space (\Omega, \mathcal{F}, P) is given, we can ask questions like "How likely is it that the value of X is bigger than 2?". This is the same as the probability of the event \{ \omega�: X(\omega) > 2 \}\,\! which is often written as P(X > 2)\,\! for short, and easily obtained since P(X > 2)=1-P(X \le 2)

Recording all these probabilities of output ranges of a real-valued random variable X yields the probability distribution of X. The probability distribution "forgets" about the particular probability space used to define X and only records the probabilities of various values of X. Such a probability distribution can always be captured by its cumulative distribution function

F_X(x) = \operatorname{P}(X \le x)

and sometimes also using a probability density function. In measure-theoretic terms, we use the random variable X to "push-forward" the measure P on Ω to a measure dF on R. The underlying probability space Ω is a technical device used to guarantee the existence of random variables, and sometimes to construct them. In practice, one often disposes of the space Ω altogether and just puts a measure on R that assigns measure 1 to the whole real line, i.e., one works with probability distributions instead of random variables.

Moments

The probability distribution of a random variable is often characterised by a small number of parameters, which also have a practical interpretation. For example, it is often enough to know what its "average value" is. This is captured by the mathematical concept of expected value of a random variable, denoted E[X], and also called the first moment. In general, E[f(X)] is not equal to f(E[X]). Once the "average value" is known, one could then ask how far from this average value the values of X typically are, a question that is answered by the variance and standard deviation of a random variable. E[X] can be viewed intuitively as an average obtained from an infinite population, the members of which are particular evaluations of X.

Mathematically, this is known as the (generalised) problem of moments: for a given class of random variables X, find a collection {fi} of functions such that the expectation values E[fi(X)] fully characterise the distribution of the random variable X.

Functions of random variables

If we have a random variable X\! on \Omega \,\! and a Borel measurable function g: \mathbb{R} \rightarrow \mathbb{R}, then Y = g(X)\,\! will also be a random variable on \Omega\,\! , since the composition of measurable functions is also measurable. (However, this is not true if g is Lebesgue measurable.) The same procedure that allowed one to go from a probability space (\Omega, P)\,\! to (\mathbb{R}, dF_{X}) can be used to obtain the distribution of Y\,\! . The cumulative distribution function of Y\,\! is

F_Y(y) = \operatorname{P}(g(X) \le y).

If function g is invertible, i.e. g−1 exists, and increasing, then the previous relation can be extended to obtain

F_Y(y) = \operatorname{P}(g(X) \le y) = \operatorname{P}(X \le g^{-1}(y)) = F_X(g^{-1}(y))

and, again with the same hypotheses of invertibility of g, assuming also differentiability, we can find the relation between the probability density functions by differentiating both sides with respect to y, in order to obtain

f_Y(y) = f_X(g^{-1}(y)) \left| \frac{d g^{-1}(y)}{d y} \right| .

If there is no invertibility of g but each y admits at most a countable number of roots (i.e. a finite, or countably infinite, number of xi such that y = g(xi)) then the previous relation between the probability density functions can be generalized with

f_Y(y) = \sum_{i} f_X(g_{i}^{-1}(y)) \left| \frac{d g_{i}^{-1}(y)}{d y} \right|

where xi = gi-1(y). The formulas for densities do not demand g to be increasing.

Example 1

Let X be a real-valued, continuous random variable and let Y = X2.

F_Y(y) = \operatorname{P}(X^2 \le y).

If y < 0, then P(X2y) = 0, so

F_Y(y) = 0\qquad\hbox{if}\quad y < 0.

If y ≥ 0, then

\operatorname{P}(X^2 \le y) = \operatorname{P}(|X| \le \sqrt{y})
 = \operatorname{P}(-\sqrt{y} \le  X \le \sqrt{y}),

so

F_Y(y) = F_X(\sqrt{y}) - F_X(-\sqrt{y})\qquad\hbox{if}\quad y \ge 0.

Example 2

Suppose \scriptstyle X is a random variable with a cumulative distribution

 F_{X}(x) = P(X \leq x) = \frac{1}{(1 %2B e^{-x})^{\theta}}

where \scriptstyle \theta > 0 is a fixed parameter. Consider the random variable  \scriptstyle Y = \mathrm{log}(1 %2B e^{-X}). Then,

 F_{Y}(y) = P(Y \leq y) = P(\mathrm{log}(1 %2B e^{-X}) \leq y) = P(X > -\mathrm{log}(e^{y} - 1)).\,

The last expression can be calculated in terms of the cumulative distribution of X, so

 F_{Y}(y) = 1 - F_{X}(-\mathrm{log}(e^{y} - 1)) \,
 = 1 - \frac{1}{(1 %2B e^{\mathrm{log}(e^{y} - 1)})^{\theta}}
 = 1 - \frac{1}{(1 %2B e^{y} - 1)^{\theta}}
 = 1 - e^{-y \theta}.\,

Equivalence of random variables

There are several different senses in which random variables can be considered to be equivalent. Two random variables can be equal, equal almost surely, or equal in distribution.

In increasing order of strength, the precise definition of these notions of equivalence is given below.

Equality in distribution

If the sample space is a subset of the real line a possible definition is that random variables X and Y are equal in distribution if they have the same distribution functions:

\operatorname{P}(X \le x) = \operatorname{P}(Y \le x)\quad\hbox{for all}\quad x.

Two random variables having equal moment generating functions have the same distribution. This provides, for example, a useful method of checking equality of certain functions of i.i.d. random variables. However, the moment generating function exists only for distributions that are good enough.

Almost sure equality

Two random variables X and Y are equal almost surely if, and only if, the probability that they are different is zero:

\operatorname{P}(X \neq Y) = 0.

For all practical purposes in probability theory, this notion of equivalence is as strong as actual equality. It is associated to the following distance:

d_\infty(X,Y)=\mathrm{ess } \sup_\omega|X(\omega)-Y(\omega)|,

where "ess sup" represents the essential supremum in the sense of measure theory.

Equality

Finally, the two random variables X and Y are equal if they are equal as functions on their measurable space:

X(\omega)=Y(\omega)\qquad\hbox{for all }\omega.

Convergence

A significant theme in mathematical statistics consists of obtaining convergence results for certain sequences of random variables; for instance the law of large numbers and the central limit theorem.

There are various senses in which a sequence (Xn) of random variables can converge to a random variable X. These are explained in the article on convergence of random variables.

See also

References

  1. ^ Rice, John (1999). Mathematical Statistics and Data Analysis. Duxbury Press. ISBN 0534209343. 
  2. ^ Fristedt & Gray (1996, page 11)

Literature

  • Fristedt, Bert; Gray, Lawrence (1996). A modern approach to probability theory. Boston: Birkhäuser. ISBN 3-7643-3807-5. 
  • Kallenberg, O., Random Measures, 4th edition. Academic Press, New York, London; Akademie-Verlag, Berlin (1986). MR0854102 ISBN 0-12-394960-2
  • Kallenberg, O., Foundations of Modern Probability, 2nd edition. Springer-Verlag, New York, Berlin, Heidelberg (2001). ISBN 0-387-95313-2
  • Papoulis, Athanasios 1965 Probability, Random Variables, and Stochastic Processes. McGraw–Hill Kogakusha, Tokyo, 9th edition, ISBN 0-07-119981-0.
  • Anderson, Sweeney, Williams, Freeman, Shoesmith. Statistics for Business and Economics - 2nd Edition. Cengage Learning (2010). ISBN 978-1-4080-1810-1

This article incorporates material from Random variable on PlanetMath, which is licensed under the Creative Commons Attribution/Share-Alike License.